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Abstract 

This paper presents a probabilistic analysis of plausible reasoning about defaults and about like- 
lihood. “Likely’’ and “by default” are in fact treated as duals in the same sense as “possibility” and 
“necessity”. To model these four forms probabilistically, a logic QDP and its quantitative counter- 
part DP are derived that allow qualitative and corresponding quantitative reasoning. Consistency 
and consequence results for subsets of the logics are given that require at most a quadratic number of 
satisfiability tests in the underlying propositional logic. The quantitative logic shows how to track the 
propagation error inherent in these reasoning forms. The methodology and sound framework of the 
system highlights their approximate nature, the dualities, and the need for complementary reasoning 
about relevance. 

Index Terms: default, likelihood, plausible reasoning, qualitative reasoning, subjective probability. 
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1 Introduction 

Default reasoning is a form of non-monotonic reasoning which can be introduced by Delgrande [1] as 
follows: 


Many common sense assertions about the real world express default or prototypical properties 
of individuals or classes of individuals, rather than strict conditional relations. Thus, for 
example, “birds fly” attributes the property of flight to birds, even though birds with broken 
wings generally don’t fly, and quite probably no penguin flies. The import of “birds fly” then 
certainly isn’t that all birds fly, but rather is more along the lines of “typically birds fly”. 

This form of default reasoning then is concerned with drawing “typical” conclusions. There is a continu- 
ously growing and diverging variety of theoretical treatments on this and other forms of non-monotonic 
reasoning [2, 3, 1,4, 5, 6, 7, 8]. ^ 

Likelihood reasoning, another form of plausible reasoning, is more concerned with drawing “likely” 
conclusions. For example, it is “likely” or reasonably possible that a coin tossed twice will land heads 
both times, although this certainly iB not “typically” the case. It is not “likely”, however, that a coin 
tossed twice will land on its side one of those times. Although the laws of physics might treat this as 
a “possible” outcome, for most practical purposes it is not. When one is considering possible outcomes, 
rather than looking at all, likelihood reasoning is intended to be applied to find only those outcomes that 
are reasonably possible. A historical perspective and further discussion for this form of reasoning can be 
found in [9]. 

The relationship between probability and plausible reasoning is best introduced by Polya [10, Chapter 
XV] in his work on reasoning in mathematics, Polya introduced a system of guides to the mathematician 
of the form: 


[Given a conjecture,] the verification of a consequence renders the conjecture more credible. 

Our confidence in a conjecture can only increase when an incompatible rival conjecture has 
been exploded. 

These guides were based on belief about conjectures modelled as subjective probabilities. Plausible rear 
soning about default and likelihood, however, has more often been modelled in AI using purely logical 
formalisms [5, 1,6,9] or non-standard probabilistic methods [11,12,13], although probability-motivated ap- 
proaches exist [7,14]. Another form of reasoning seen in areas such as qualitative physics and model-based 
diagnosis systems is the qualitative and approximate reasoning about physical devises. In this paper we 
combine the two paradigms, probabilistic and qualitative/approximate, to model default and likelihood 
reasoning, and so take up Polya’s theme more fully. 

Surprising to some, it is controversial whether these plausible reasoning forms can be modelled with 
probabilities 1 [15,4,16]. Likelihood reasoning and some forms of default reasoning, however, will always 
remain problems of uncertainty or incomplete information. With some forms of non-monotonic reasoning, 
such as the closed-world assumption used in database systems and PROLOG, uncertainty does not exist 
because the default is actually a convention. These exceptions aside, there comes a time when something 
that is currently “typically” or “likely” to hold becomes known true or false. Until that time, we are in 
a state of uncertainty. However well logical systems may cope with modelling these reasoning forms, we 


1 Chees email has said [15, pl002]: 

Unfortunately, the logical style of reasoning is so prevalent in AI that many hav e at tempted to force intrinsically 
probabilistic situations into a logical straight jacket with predictable limited tuccew. 
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should at least see how they can be modelled by a theory of uncertainty like probability. Perhaps there is 
more to learn? 

This paper follows the view that subjective Bayesian probability theory provides a benchmark against 
which methods for reasoning about uncertainty can be compared. The theory is a normative theory 
of reasoning about uncertainty, which means it gives a prescription for how uncertain reasoning should 
be done. The prescription itself has been derived from a set of fundamental axioms about belief (an 
introduction to this in the AI context is in [17]). One can model default and likelihood reasoning as either 
qualitative or quantitative approximations to full normative probabilistic reasoning. One can then argue 
that the resulting model seems to exhibit the required properties, and compare the model with some 
existing methods. 

A logic QDP (a mnemomic for qualitative default probabilistic logic) is developed here from a suitable 
quantitative counterpart DP as a demonstration. This yields a probabilistic system as a canvas on which a 
number of more significant issues can be sketched. These issues are: (1) the interplay between quantitative 
and qualitative forms of plausible reasoning, (2) the duality between default and likelihood reasoning, (3) 
the approximate nature of these reasoning forms, for instance, the propagation of errors in reasoning, and 
(4) the need for complementary reasoning about, for instance, relevance. 

The logic QDP , being probabilistically based, is easily able to express sentences 2 such as “most birds 
fly”. This is using a “default” conditional style operator “=>” as in: Bird(x) => Flies(x). Similarly, “an 
Australian is likely to drink Foster’s” 3 can be represented with a “likely” conditional style operator “a>-” 
as in: Australian Drink s- Foster' s. This operator also has iterated forms indicated by numeric super- 
scripts, “s$- 2 ”, that express lesser degrees of likelihood, as in: Australian £!h 2 Dr inks-another- Foster's, 
which expresses the fact that, at least occasionly, an Australian will drink even more Foster’s. 

Surprisingly enough, QDP is also able to express sentences more in the spirit of autoepistemic [18] 
and default logics [2]. We can interpret the sentence “a professor has a Ph.D. unless known otherwise” 
two ways: 


o (Prof(x) A Phd(x)) — - (Prof(x) => Phd(x)) , 
o(Pro/(x) A Phd(x)) — ► □ (Prof(x) — ► Phd(x)) , 

where the operator represents necessity interpreted as “known with certainty”, and the dual “o” 
operator represents possibility interpreted as “the negation is not known with certainty”. Read as “if it 
is possible that a particular professor has a PhD, then the professor most likely has a Ph.D”, and “if it is 
possible that a particular professor has a PhD, then the professor definitely has a Ph.D.” respectively. The 
default logic representation, from Prof(x) A M Phd(x) infer Phd(x ), corresponds to the second reading. 
So the possibility operator, “o”, behaves rather like the M operator of default logic. 

The default component of the logic QDP is a variant and extension of Adams’ conditional logic 
[19], applied to default reasoning by Pearl [7]. The probabilistic semantics of QDP differs slightly from 
Adams’ logic however, because QDP is developed as a qualitative model for order of magnitude reasoning 
about probabilities, rather than being based on infinitesimal arguments. Like Adams’ logic, QDP can 
be combined with a notion of relevance or causality to resolve the so-called default paradoxes: the Yale 
shooting problem [8] and “can Joe read and write?” [7]. The logics also resolves the “vanishing subclasses” 
paradox [20]. These three paradoxes are discussed in Section 5. A fourth paradox is the lottery paradox 
[3], considered in Section 3. This has a version both in default and likelihood reasoning, and provides an 
example of the propagation of errors inherent in these reasoning forms. 

The logics has been modelled after Delgrande’s modal conditional logic NP which allowed reasoning 
about default rules. Likewise, reasoning about defaults and likelihood is an important feature of the 


3 Although propositional sentences are dealt with throughout, pseudo-first-order sentences will sometimes be used. They 
are effectively propositional if there are known to be a finite number of constants, no quantifiers are allowed, and a sentence 
with variables is intended to represent a sentence schema. 

3 For the record, many Australians don’t. Same diink XX XX, others Swan, .... 
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approach here. For example, suppose you know friends have travelled to Australia. Then they are likely 
to have visited Sydney. Although any visitor to Sydney will typically see the Sydney Harbour Bridge, 
it is only likely that they will visit Bondi Beach. We can infer that your friend is likely to (rather than 
“typically”) have seen the Harbour Bridge but is less likely to have visited Bondi. In QDP y this argument 
can be summed up as follows: 

true Visit- Sydney(Bruce) 

Visit- Sydney(x) => See- H arbour-Bridge(x) 

Visit- Sydney (x) Visit- Bondi(x) 

\=. qj)p (true See- Harbour- Bridge(Bruce)) A (true Visit-Bondi(Bruce)) . 

Consistency and consequence tests developed for subsets of the default and likelihood components of the 
logics also show how this form of reasoning can be automated in a manner requiring at most a quadratic 
number of satisfiability tests in the underlying propositional logic. With a careful choice of the underlying 
propositional logic, the operation can then be quite efficient. 

Perhaps most significantly, this reasoning can be easily complemented with error tracking facilities 
to indicate when the conclusions from a chain of such plausible reasoning may be coming doubtful. For 
instance, it is shown in some circumstances that error when reasoning about defaults can increase at most 
additively, while error when reasoning about likelihood can increase multiplicatively. It is not claimed, 
however, that these tracking facilities are a substitute for a more thorough probabilistic approach; they 
are merely an approximation. 

The paper follows the following course. First, the philosophical problem of modelling default reasoning 
with probabilities is considered in Section 2. The corresponding discussion for likelihood reasoning is not 
given here, because the principle objections in AI to modelling likelihood reasoning with probabilities do 
not centre around the use of probability theory at all, but whether the modelling should be qualitative or 
quantitative [9], and both are done here. A basic probabilistic framework for plausible reasoning is then 
proposed in Section 3. Two logics, one with a probabilistic semantics, DP> and a qualitative version, QDP , 
are then introduced in Section 4. Here, the duality between default and likelihood is introduced, and the 
consistency and consequence results are developed. Section 5 demonstrates a methodology for applying 
the qualitative logic, using relevance, and Section 6 draws some comparisons with other probabilistic 
approaches. 

2 On Modelling Default Reasoning with Subjective Bayesian 
Probability 

Non-mono tonic reasoning is generally considered to have three broad forms [4,18]: avtoepistemic reasoning 
is reasoning about self-knowledge of beliefs [18], for instance, “if I had an older brother I would know 
about it”; conventions are used in the interpretation of natural language and with the closed- world 
assumption often made for database systems; and typicality or default reasoning is the form discussed 
in the Introduction. 

To illustrate the use of convention in natural language, consider the sentence "birds lay eggs” [16], 
which is certainly not true for the male half of the bird population. The sentence is more accurately stated 
as “[female] birds lay eggs [to reproduce]”. The parts in the square brackets are implicit. Most people 
realise that male birds cannot lay eggs, so in the interests of brevity, the speaker leaves “female” to be 
inferred from the remainder of the sentence. This implicit convention is handled in nonmonotonic systems 
using knowledge of the form “an X is a Y unless known otherwise”. As illustrated in the introduction, this 
form can also be represented in a probabilistic framework using the probabilistic version of the possibility 
and necessity operators. 

When modelling the third form, typicality or default reasoning, we are hampered by the fact that 
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there is little consensus as to its exact nature [20]. Hanks and McDermott [8] say, 


While it is not entirely clear exactly what constitutes default reasoning, the phenomenon 
commonly manifests itself when we know what conclusions should be drawn about typical 
situations or objects, . . . 

Neufeld, Poole and Aleliunas [20] make an even stronger statement. They say, 


What, then, does a default mean? Within the default logic camp, we know of no work which 
provides a semantics for defaults, in the sense that an experiment is described that can be 
performed in the semantic domain to verify the truth of a default. 

However, there is general agreement that default reasoning is a form of “defeasible inference*, or “plausible 
reasoning* [21,8], and that default conclusions have some (often small [21]) degree of uncertainty to them. 

Given that default reasoning is an admittedly specialised form of reasoning under uncertainty, it is 
natural to pose the question: can probability theory model default reasoning (see also [7])? Critics of a 
Bayesian approach claim that probabilities are just not suited for describing “prototypical” knowledge. 
Must arguments, however, are based on some misunderstanding. 

Nutter [16] gives the following argument: 


For instance: if . . . the by now tormented example “Birds fly* really means “Most birds fly*, 
then birds don’t fly in spring. In the nesting season, baby birds outnumber adults. Baby birds 
don’t fly. Hence in the nesting season, “Most birds fly” is false. 

To the Bayesian, “Most birds fly* is interpreted as “if we know nothing else about a particular bird, then 
that bird most likely flies*. Notice the “most likely* conclusion is conditioned on our current knowledge 
about the bird. In particular, if we know it is nesting season, we cannot conclude the bird most likely flies 
because we do now know some additional thing about the bird. Two rules are relevant to the situations 
Nutter gives: “Most birds fly* and “In the nesting season, most birds don’t fly*. If we do not know that 
it is the nesting season, then the first rule is applicable because it usually is not the nesting season. The 
importance of conditioning probabilistic statements with context or current knowledge is a key feature of 
probabilistic reasoning and the cornerstone of the subjective Bayesian approach. 

McCarthy address a similar concern [4, p92]. 


Note that the general probability that a bird can fly may be irrelevant, because we are in- 
terested in the facts that influence our opinion about whether a particular bird can fly in a 
particular situation. 

Classical statistics, with its concern about long term frequencies and samples spaces, can have problems in 
adapting general knowledge to specific situations. The ability to adapt knowledge to particular situations, 
however, is a hallmark of Bayesian methods. In this case, suppose we know that the bird is a male yellow- 
bellied warbler, but we have no knowledge at all about this type of bird, or even what they may be similar 
to. The only relevant knowledge we have is the general probability statement that most birds fly. In the 
absense of information to the contrary, we assume that other details about the bird are irrelevant (this is 
the maximum entropy argument [7]), which leads us to the quite reasonable conclusion that most male 
yellow-bellied warblers fly. We can now reason about this particular bird. 

There are, however, strong arguments that default reasoning should be modelled by probability with 
caution . In practice, an intelligent system may not be able to supply precise probabilities for its beliefs 
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and may not be able to perform all the exact calculations required to maintain its beliefs in accord with 
Bayesian principles as new evidence becomes available. People certainly cannot. It is of course not just 
the computation that causes problems but the communication required to prime and then update an 
intelligent system with an adequate set of beliefs. 

The normative properties of Bayesian theory assures us that despite these problems, by trying to 
approximate the Bayesian approach our reasoning at least remains approximately rational. Essentially, it 
is the best we can do in an inherently imprecise and computationally complex world. This view has been 
supported in AI alone in a range of areas [22,23,24,25]. 

3 A Framework for Plausible Reasoning 

In this section, a basic framework for default and likelihood reasoning is developed. These two forms of 
reasoning are referred to below as plausible reasoning. Before presenting the framework, we first consider 
some major features of plausible reasoning, and then infer properties that a plausible reasoning system 
should have. 

3.1 Basic features of plausible reasoning 

There are several basic features of plausible reasoning that must effect the design of a plausible reasoning 
system. While these can be derived from the probabilistic model presented in the next section, the features 
are presented here independently of any probabilistic analysis. 

Plausible reasoning is non-monotonic 

With standard logical reasoning, conclusions derivable from a set of sentences increase monoionically as 
the set of sentences is extended. That is, if S logically implies C, and we extend 5 with A , then 5 A A 
also logically implies <7. 

Default reasoning is known to be non-monotonic [3]; the above monotonicity property breaks down. 
So while you might well believe that birds fly, on discovering that a certain bird is aj>aby bird in nesting 
season, you would no longer believe that particular bird flies. So your^et of beliefs have extended one 
way but contracted another. Similarly, something that initially seems likely can become, with changing 
circumstances, well nigh impossible. 

Error combines along a chain of plausible reasoning ^ _ ... . ; v, : I ... r • 

A second key feature of standard logical reasoning is that if the premises are known to be true, then the 
conclusion from a long chain of reasoning steps must also be true. With plausible reasoning, however, 
there is an inherent element of uncertainty involved, so it is natural to suspect this key feature might 
break down. ; ^ ^ ^ 1 

The famous lottery paradox [3] is an excellent example of this. For a single lottery entrant, Leslie say, 
one can conclude by default that Leslie will not win the lottery. But we can apply this sort of reasoning 
to every potential lottery entrant. There are two paradoxes here. First, why is it that someone actually 
wins the lottery. Second, why does Leslie bother to enter the lottery in the first place. 

For a lottery with one million entrants, the default conclusion about Leslie has an obvious statistical 
error of one ten-thousandth of 1%, acceptable by most standards. If we make a logical deduction based 
on one million such default conclusions, the one million errors certainly combine to give a total error of 
100% (after all, someone definitely wins the lottery). That Leslie would enter the lottery aTMS as much 
irrational behaviour due to the effect of large sums of money, as it is the^result of plausible reasoning. 
Perhaps it is because most people do not mind losing one dollar just to be given the remotest chance of 
winning one million dollars. In the former, their life is no different; in the latter, well . . . 

This last point anticipates the next basic feature of plausible reasoning. 
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Plausible reasoning is effected by the decision context 

After a system performs plausible reasoning, it would typically decide some course of action. As a result of 
the action, the system might make some gain or incur some loss. For Leslie in the lottery situation above 
the potential loss is one dollar while the potential gain is one million minus one dollars. This feature of 
reasoning is referred to as the decision context and the losses and gains as the utilities. 

Shoham provides the following illustration of how the decision context can effect plausible reasoning. 


. . . think of making the default inference “people you’ll meet on the street will not stab you 
in the back” in a city in which only 5% of the population are back stabbers. In this case the 
relatively small chance of being hurt seems to outweigh the computational resources needed 
to reason about individual people on the street, and the discomfort of wearing a steel-plated 
vest. Notice that if the 5% dropped to 0.00000000005%, we’d take off the armor and stop 
looking darkly at passers by. 

Clearly, the decision context should be taken into consideration (see also [26]). 

3.2 Basic properties of a plausible reasoning system 

The above features can be used to argue that a method for plausible reasoning should have certain basic 
properties. 

A first property is that plausible reasoning needs to be sensitive both to the current knowledge of 
the system and to the decision context. This is directly suggested by the features given in the previous 
subsection. Sensitivity to the decision context can be handled by targeting a default system for a single 
decision context. 

Now the number of different states of knowledge is potentially exponential in the number of propo- 
sitional symbols. So a system could not reasonably keep separate default rules for each possible state 
of knowledge and decision context. To get around this problem, a second property seems important: it 
should be possible to reason about plausible rules and the relevance of different facts to the applicability of 
a plausible rule. It may also be useful to give a system the ability to compile plausible rules from some 
more fundamental knowledge form. 

Third, because of non-monotonicity and error propagation, plausible conclusions need to be flagged as 
such, and should not be confused with the current knowledge . In fact, because of the possible need for 
weighing up belief when combining error or considering the decision context, plausible conclusions may 
need to be tagged with some form of qualitative or quantitative measure of belief. Whether this is done 
and how surely depends on the application concerned; no single approach will be favoured in this paper. 


3.3 A probabilistic framework 

It is beyond the scope of this paper to cover the basic notions of probability and decision theory underlying 
subsequent sections. Suitable introductions from an AI perspective can be found in [26,27,7]. The problem 
of the decision context in plausible reasoning is side-stepped here by assuming that a default system is 
being prepared for a specific binary (yes/no) decision. In this simple case, a decision has to be made 
whether some condition, A say, is “true” or “false”. Once utilities of the problem are taken into account, 
the problem invariably reduces to “is Pr(A) > p?” for some p E [0, 1]. Given a particular decision 
context for a binary decision, we can therefore use approximate inequality reasoning to make decisions in 
a normative manner. 

The notion of probability used here is subjective probability , which is a measure of belief prescribed to 
some proposition by an intelligent system. This is represented as Pr(A|B) E [0, 1], interpreted as follows: 
a particular intelligent system, on knowing just B, has a measure of belief Pr(A\B) in A being true. The 
“|” operator is called the conditioning operator. Its left hand side is the proposition whose belief is being 
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considered and its right hand side specifies a 11 current knowledge relevant to A of the intelligent system. 
A probability distribution is a particular function Pr consistent with the standard axioms of probability 
theory. 

A probabilistic framework for plausible reasoning is based on the assumptions that (1) plausible state- 
ments that are uncertain should be interpreted in some way using subjective probability statements, and 
that (2) methods of plausible reasoning which deal with uncertainty should be interpreted as approxi- 
mations to subjective probability or decision theory. We shall treat a default conclusion as a plausible 
proposition in which one has “sufficiently high beliefs . Similarly, a likely conclusion is a plausible propo- 
sition in which one has “belief that it is reasonably possible”. In both cases, the belief is modelled as 
subjective probability and should be conditioned on current knowledge using the conditioning operator. 
Due to the decision theoretic argument above, both these types of plausible reasoning should, in many 
cases, be a good approximation to the normative probabilistic approach. 

Notice that this rough probabilistic interpretation of defaults and likelihood automatically provides a 
framework which addresses the basic properties of plausible reasoning discussed in this section. Decision 
theory provides the basis for considering the decision context. The conditioning operator provides the 
mechanism for making plausible conclusions sensitive to a system’s current knowledge and for keeping 
plausible conclusions (on the left hand side) separate from current knowledge (on the right hand side). 
Probability theory also provides the potential for developing ways of reasoning about plausible rules, 
and with the notion of independence, ways of reasoning about relevance. Some of these connections are 
explored more fully in the next section. Finally, probability theory provides a framework for both testing 
and developing default rules for a given application, for instance, by learning them from examples. 


4 Default Probabilistic Logic 

This section introduces two logics for default and likelihood reasoning: a probabilistic logic DP and 
its qualitative counterpart QDP. These are applicable in the broad framework given in Section 3 for 
reasoning about defaults and likelihood. Notation and semantics of these logics are first covered in 
Sections 4.2 and 4.3. Some basic properties of the logics are then outlined. One theme of this paper is the 
importance of reasoning about relevance; Section 4.5 motivates this and shows how relevance information 
can interface with default and likelihood reasoning. Another theme of the paper is the approximate nature 
of both these reasoning forms; Section 4.6 shows how, for small errors at least, the quantitative logic DP 
can be treated as a simple numeric extension of the qualitative logic QDP . This last section presents 
consistency and consequence results for fragments of both logics. 


4.1 Introduction 

DP is a propositional logic annotated with probability bounds, and has a probabilistic rather than a 
possible world semantics. This allows the sort of inequality reasoning found in Quinlan’s INFERNO 
[28]. inequality reasoning is an approximation to normative reasoning about point probabilities when a 
decision is binary, as explained in Section 3.3. So the justification for DP is approximation, rather than 
some fundamental principle about intervals or fuzzy sets for reasoning under uncertainty. In this sense, it 
differs in philosophy from Ginsberg’s suggestion [12] or Dubois and Prade’s treatment of syllogism 8 [13]. 

QDP has the annotations dropped, and the default component is almost identical to Geffner and 
Pearl’s logic of defaults [7,29] borrowed from Adams’ logic of conditionals [30,19]. QDP is also similar to 

Delgrande’s conditional logic NP [1]. 

QDP is designed to be a qualitative counterpart of DP. It is intended to be an approximation to 
DP for reasoning about “small” but not infinitesimal probabilities. The semantics of QDP complements 
DP and is based on order of magnitude reasoning. Like NP, dynamic aspects of plausible reasoning 
(for instance, involving action and time) are not handled directly by either DP or QDP, although they 
can often be handled with a simple situation calculus, as is done in Section 5.3. In the general case, an 


8 


extension of the logic would be required. 


4.2 Basic notation 

A standard propositional language denoted Lp is used here. This is formed in the usual manner from a 
finite set of atomic propositions P = {pi, . . .,p n } together with true and false , the standard connectives, 
-i (negation), — + (conditional), A (conjunction), V (disjunction) and «-* (biconditional). “|= A” denotes 
that propositional formula A is a theorem of the usual propositional logic. 

Probability distributions can be given over the language Lp as follows. An event space Ep , a mutually 
exclusive and exhaustive set of events, is readily constructed from a subset of Lp. Given n atomic 
propositions P as described above, this would have cardinality 2 n and one such set is given by 

E P = { Li A . . . A L n | for i = 1, . . n, i, = p, or -p* } . (1) 

A probability distribution Pr : Ep i— ► [0, 1] maps events to measures of belief. For A,5 6X? 


Pr(A) 


Pr(B\A ) 


5 ^ **»•(«), 

e£Ep 

*A 


ifPr(,4)>0 

1 otherwise 


In many probability texts, if Pr(A) = 0 then Pr(J3|A) is undefined. Instead we assert that if Pr(A) = 0 
then Pr(B|A) = 1. This means we can reason about conditional probabilities even if the antecedent of 
the conditioning is false. A probability distribution like Pr above is termed a distribution over Lp. 

The probabilistic logic DP describes constraints on probability distributions over the language Lp. It 
is built on the language Dp that is constructed from Lp together with four modal operators: the unary 
connectives □ (necessity), o (possibility), and the binary connectives =» (default with error bound) and 
By (likelihood with lower bound). There is no nesting of these operators. Nesting would represent second 
and higher-order probability statements [31], as used in learning to reason about belief in probabilistic 
models [25], but is unnecessary for the initial treatment here. The operators can be interpreted as follows. 


□A: A is necessarily true in any situation. 


oA: Some situation can possibly arise in which A is true. 

A =>> B : Given that you know just A about the current situation, it is safe to infer B by default (with 
error in belief at most e). 

A zy e B: Given that you know just A about the current situation, B is at least likely (with belief no 
less than e). 

In the language QDp the subscripts are dropped. QDp also has successively weaker forms of the likelihood 
operator. A sy B denotes “likely”, whereas A ?y 2 B would denote “barely likely”, etc. This is related to 
the iterated likelihood operator found in [14]. 

A zy n B: Given that you know just A about the current situation, B is at least likely to be ... to be 
likely (to order n). 

Both the likelihood and default operators are conditional operators, in a similar sense to [1]. For 
instance, in the cases above each is conditioned on A. It will be shown later that it is unnecessary for the 
necessity and possibility operators to have conditional forms. 

Definition 4.1 The sentences or well formed formulae (vrffs) of Dp comprise the least set such that 
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1. If A 6 Lp then □ A is a wff. 

2. If A, B € Lp then A => t B is a wff for 0 < € < 1. 

3. If D,E e Dp then -«D and D -+ E are wffs . 

Conjunction fAj, dw/unction and biconditional (*-*) on sentences in Dp, and possibility (o) and 
likelihood (?>- ) on sentences in Lp are introduced by definition . 

Definition 4.2 The sentences or well formed formulae of QDp consist of the sentences of Dp with 
the numeric subscripts dropped from and The operator may have optional integer 

superscripts weakening the order of likelihood , 

Some examples of QDP sentences were given in the introduction. The four modal operators have 
operator precedence midway between disjunction and conditional /biconditional. So a disjunction binds 
before a default operator, and a default operator binds before a conditional. For instance, the sentence 

AV B AC D ->oE AF 


is identical to the sentence 

((A V (B A C)) =»D) — o(E A F) . 


Although, 


oA A B => C 


is identical to the sentence 

(o A) A (B => C) , 

because otherwise the sentence does not parse. 


4.3 Semantics 

In DP, D” denotes that D G Dp is true for the probability distribution Pr. Pr plays a role not 

unlike an interpretation in standard propositional logic. 

Definition 4.3 Given a probability distribution Pr on Lp, *}= p r * is defined on sentences from Dp as 
follows . 

1. p r □ A if and only if Pr (A) = 1. 

2. \=p r A=> e B if and only if Pr(B\A) > 1 - e. 

8. \=zp T -*D if and only if not \=p r D. 

I. [=p r D — ♦ E if and only if not ^=p r D or | =p r E. 

Possibility and likelihood are by definition dual operators for necessity and default respectively, “o A” 
is defined as A"> so \= Pr o A if and only if Pr(A) > 0. U A P” is defined as U ^(A =^ e “-P)”, so 
A B if and only if Pr(D|A) > e. In addition, “*=S> e P” is shorthand for Hrue => e D”, and likewise for 

If the necessity operator were to have a conditional version, it would have the semantics Pr(B|A) = 1, 
but since this is equivalent to Pr(A — » B) = 1, a conditional form of necessity can be adequately 
constructed as □(>! — ► P). Likewise, a conditional version of the possibility operator can be constructed 
as oA — > o(A A P). 

A map translating probabilities to subsequent modal representation is given in Figure 1. By convention, 
is subscripted by greek letters e, 6 } etc., which are intended to be small (< 1), whereas, is 

subscripted by the letters e, /, etc., which are intended to be not as small. This is no absolute restriction; 
it gives an indication of the intent of the sentences. 
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Definition 4.4 A sentence D £ Dp is a theorem of the probabilistic logic DP if (=p r D for all possible 
probability distributions Pr. This is denoted D”. D is a consequence of a set of sentences T if 

there are Di, . . D„ £ T such that \=dp (Di A ... A D n ) — ► D. D is consistent if^D is not a theorem of 
DP, 

To obtain qualitative rules about default and likelihood from the quantitative rules in DP, we can 
perform order of magnitude reasoning. We can consider a representative default error, 6, where e might 
be less than 0.01, or whatever the decision context requires. Likewise, we can consider a representative 
default likelihood, e, where e might be greater than 0.05, say. The choice for modelling particular limits 
rather than some arbitrary infinitesimal is motivated by the decision theoretic argument at the beginning 
of Section 3.3. In order to approximate the behaviour of our reasoning with these particular limits in 
mind, we can parameterise the system by e and e and consider only approximate calculations to 0(e) 
and O(e). A map translating probabilities to these kinds of qualitative values is given in Figure 2. The 
hashed regions represent those fuzzy boundaries where the qualitative reasoning becomes most susceptible 
to error. 

QDp is defined in a manner such that e and e are arbitrarily small, but e is also arbitrarily smaller 
than e. Of course, it is unrealistic to expect arbitrarily small magnitudes for e and e to be achieved, let 
alone the right relative magnitudes. This, however, is irrelevant, as far as the application of the logic is 
concerned. The “arbitrarily small” magnitudes are only being used as a theoretical device to investigate 
the approximate behaviour of notions like and for e being small and e being not quite as small 
(see also [7, Section 10.2.4]). In addition, the choice of relative magnitude between e and e is a particular 
design decision that might just as well have been made some other way. Applications of QDP should of 
course take this into account. 

Definition 4.5 A sentence D £ QDp is a theorem of the qualitative probabilistic logic QPD if there 
exists a theorem D f £ Dp corresponding to D (that is, identical except for any super or subscripts), in 
which all subscripts to “=> ” and 9 are parameterised by some variables € and e and each subscript to 
^ ” is of order e as e approaches 0 and e remains finite, and each subscript in D f corresponding to ” 
in D is of order e n as e and | approach 0. This is denoted “| =qdp D”. Consequence and consistency 
are defined as before . 

From the definition of (for e > e) it follows that 

\=dp ~*(F^e A A |=> € -«A) . 

Consequently, 

I -QDP _, (p>- A ~*A) . (2) 

That is, if something is likely, its negation cannot be true by default. But the complementary sentence 
(p*- A V }=>- ->j 4), is not a theorem. 

It follows directly from this definition that the set of theorems of QDP is closed under application of 
modus ponens and conjunction. That is, 

\= DP D and \= DP D — ► E implies (=dp 25, 

\=dp D and (=dp E if and only if ^x>p D A 25 . 

Because the definition of QDP is based on an order of magnitude argument, there are potential pitfalls 
with these closure properties. Order of magnitude arguments invariably give dubious results when the 
constant factors become too large. Suppose a lottery has 1,000,000 participants. The following sentence 
can be shown to be a theorem of DP. 

1,000,000 

(person i will not win the lottery) — > l=>i,ooo,ooo*€ (no-one will win the lottery) . (3) 

»=i 
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Moreover, replacing the error bound 1, 000, 000 by 999, 999 yields a sentence that is not a theorem of 
QDP. Without the error bounds, the sentence would seem to read “if, by default, any particular person 
will not win the lottery, then, by default, no-one will win the lottery at all”. The illusory lottery paradox 
has reappeared. In DP this is not the correct reading because with the natural value for e, 100 q i0Q0 > the 
right hand side of rule (3) is impotent (its default error is 1). In QDP unfortunately, it is the correct 
reading: QDP drops the subscripts (both are of order e as e approaches 0) and loses the error information. 

If we wish a purely qualitative default logic to be closed under conjunction and modus ponens , two 
seemingly intuitive properties, then we have no choice but to accept that the above kind of anomaly 
may occur. People get around this with an intuitive knowledge of where plausible reasoning is likely to 
break down, for instance, by not making default or likelihood inference to any great depth: “don’t rest 
your argument on too many assumptions, something is bound to go wrong along the way!*. Default and 
likelihood reasoning may well produce incorrect results when carried on indefinitely; they should, however, 
be “locally” correct. Imprecision is an inherent property of plausible reasoning; so knowledge of how to 
contain the imprecision is a prerequisite for safe plausible reasoning. Hence the importance of DP in 
understanding QDP. 


4.4 Basic theorems 

This section introduces a few basic theorem schemata, and discusses several notable but unrelated prop- 
erties of the logics. Examples of using the logic QDP are given later in Section 5. 

First, the default and likelihood operators can be broken down into two components, according to 
whether the antecedent is possible or impossible. This is done using 

qdp A^B «-► (□-«j4 V qAaAzS’B). (4) 

The second component here, oA A (A => B), is referred to as the proper default operator, and likewise for 
the likelihood operator. This corresponds to Adams’ notion of the conditional over “proper” distributions 
[19, p49], that is, distributions where the antecedent of the operator must be possible. The unmodified, 
improper version of the default, A => B, corresponds to Adams’ original notion of the conditional [30]. 
While the mathematics of the improper default is generally easier, it is sometimes better to break down 
the default and likelihood operators into the two components, and then put the pieces back at the end. 

Second, both DP and QDP can be seen as natural extensions to propositional logic. For instance, 
the theorems for given later in Table 1 encode the provability relation in propositional logic. The 
following lemma further highlights the connection. 


Lemma 4.1 First, all substitution instances of the theorems and rules of inference of standard proposi- 
tional logic that are sentences of Dp hold for DP. Second , in DP necessary equivalences can be substituted. 
That is, 

Nop □ (A ~ B) (D(A) ~ D(B)) , 


where D(A) denotes any sentence of Dp with an occurrence of the propositional formula A in a particular 
position. Corresponding results for QDP hold. 


Third, some examples of theorem schemata of DP are given in Tables 1—3. These hold for d, e, e and 
6 all less than Certain dual forms, either on or on are given in the third column. These are 
obtained by restructuring the formula and converting either or “=>” to their dual. In each case, either 
the original form or the dual form can be proven by the consistency or consequence theorems presented 
in Section 4.6. For each DP theorem in Tables 1-3, the QD P sentence obtained^ be removing subscripts 
(and in the case of the duals for theorems T14 and T16, making the operator is a theorem 

of QDP. 

One important aspect of any DP theorem is the relationship between errors on the defaults and 
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likelihoods. For instance, we can rewrite theorem T17' as 


(A C ) A ( B C) -> A V B C , 


and note that this only holds for some values of /, and in particular holds for 


/ < 


ed 

e + d — ed 


< min(e, d) . 


In this case, / represents an error propagation function , which relates the errors in the DP theorem. If we 
were to apply this theorem in some chain of reasoning to deduce A V B <7, then we could either choose 
to forget about the error /, as we implicitly do when using QDP y or we could use the error propagation 
function to compute a value for / from e and d. Bear in mind that an error propagation function only 
represents a worst-case bound on error. If we were to do a more precise probabilistic analysis, we may find 
that error has shrunk to nothing, however, the error propagation function represents an upper-bound on 
what error can be. In Section 4.6 it is shown that for small errors, DP behaves just like QDP, so a system 
for reasoning about defaults and likelihoods can be constructed using the qualitative logic QDP, and then 
optionally, error tracking facilities can be grafted on top with the use of error propagation functions to 
give approximate probabilistic reasoning. 

Finally, theorems of the logics can be generalised by uniformly changing conditioning information. 


Lemma 4.2 Any theorem of DP (QDP) can be transformed to another by uniformly changing condition- 
ing information . Given conditioning information C, a formula D is transformed by uniformly applying 
the following transformations to all non-propositional operators in “otrue — ♦ D: 


OA 
o A 


B => € A 
B A 


n(C - A) 
o(CaA) 
(CAB) => € A 
(CAB) A 


Versions of some of the theorems extended using this transformation are given in Table 4. Notice that 
for theorems Til', T12', T13', and T14', the initial term “oC has been dropped: this is safe because 
the and “ft*-” operators are always true and false respectively if the conditioning part is necessarily 
equivalent to false. A similar situation holds for theorem T6'. 


4.5 Relevance 

The antecedent of a default or likelihood corresponds to the context in which the rule can be applied. So 
the rule B => C can be applied when we know just B, nothing more or less. This feature is inherited from 
the semantics of the conditioning operator in probability theory. As a result, defaults and likelihoods 
cannot have their antecedents arbitrarily specialised. That is, the QDp sentence 

(B => C) -> {A A B => C) 

is not a theorem of QDP ; so the context B cannot in general be specialised to include other information, 
in this case A . 

A second related feature of the logics is that there is no transitive relation applying to defaults or 
likelihoods. The same holds for NP [1, Section 7]. That is, the QD P sentence 

(A =» B) A (B =* C) — A => C 

is not a theorem of QDP . For instance, a counterexample to this transitive sentence is that penguins 
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are birds, most birds fly, and penguins do not fly. So we would not expect the sentence to be a theorem. 
However, if we are told that the yellow-bellied warbler is a bird, and know nothing else about it, it is quite 
plausible to us that the warbler should fly. 

So for plausible reasoning in certain situations, we would like some form of transitive reasoning. Notice 
the QDp sentence 

(A => B) A (.4 A B => C) -» A => C 

is a theorem of QDP (T12' in fact). Suppose we can obtain some additional information that implies the 
rule B => C is the same as A A B =>• C, so the condition A in the antecedent is not relevant. Then this 
additional information together with theorem T12 / shows the original transitivity form above does hold. 

Thi« ability to modify the antecedent of a default or plausible rule requires reasoning about relevance, 
where a condition in the antecedent is irrelevant if it can be added or deleted and still maintain the 
correctness of the rule. In probability theory, such information can be obtained in a number of ways. We 
can represent this information using the notion of independence, and in a more limited sense, following 
Neufeld et al. [20], the notion of favouring. 

Definition 4.0 Proportion A is independent of proposition B given proposition C if 

Pr(B\C) = Pr(B\C A A) . 

Proposition A favours proposition B given proposition C if 

Pr(B\C) < Pr(B\C A A) . 

Lemma 4.3 If proposition A is independent of proposition B given proposition C then the following 
sentences of QDp are true: 


C=>B < — ♦ (CAA)=>B , 

C => A i — ► (C A B)=> A, 

C A < — ► (C A B) tap- A , 

CfihB < — ♦ ( CAA)eyB . 

If proposition A favours proposition B given proposition C then the sentences above only hold for the 
forward direction, that is, replacing by “ — ► ”. 

It should be clear from this lemma that methods for reasoning about relevance are vital in plausible 
reasoning in order to modify plausible rules so they can be applied to each particular context. Some 
examples are given in Section 5. Causal (or Bayesian) networks can be used for this form of reasoning, 
and the maximum entropy method provides a way of making independence assumptions ”by default [7]. 


4.6 Consistency and consequence 

The question of whether a sentence from Dp is consistent can be converted to the question of whether 
one of a set of simplex problems in the 2" variables {Pr(p)|p € E P ) has a solution. Consequently, DP 
is decidable (this is similar to Probabilistic Logic [32]). For the purposes of this paper, it is not worth 
obtaining axiom schemata and rules of inference for the whole of DP , since we are really only interested 
in the case where the errors are quite small. A system encompassing the whole of DP would most likely 
degenerate to the kind found in [33, plO], where the schemata is close to an enumeration of primitive 
operations in the simplex algorithm. Fortunately, a different approach is available. Adams [30,19] has 
developed tests for consistency and entailment in his conditional logic, which have been extended by 
Goldszmidt and Pearl [34]. Similar consistency and consequence tests are presented below for the default 
and likelihood components of DP, and are easily adapted to QDP. These results show that reasoning can 
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be performed using the qualitative system QDP, and the approximate error bounds of DP propagated 
concurrently. 

Tests on consistency and consequence are presented below in terms of a clausal form. Consider the de- 
fault component of QDP. An arbitrary sentence containing the default, possibility and necessity operators 
can be turned into a conjunction of clauses, where each clause has the form 


DJ7 Ai£j v oVi A t£i A Ai =r> B% * Vfg/c.G'i ^ Hi , 

for some index sets Iy, Ia* and Ic- Notice that all necessity and possibility operators have been gathered 
in the antecedent of the clause, by converting -»□ A to o where necessary, and all the necessity operators 
have been combined into one using theorem T3. 

It is also of interest, though not essential for the development of this section, to consider a more precise 
interpretation of what it means for a clause to be a theorem in QDP . Lemma 4.4 uses the above clausal 
form to reinterpret the definition of a QDP theorem. 

Lemma 4.4 


\=qdp A i € j v o Vi A i£j A Ai => Bi — ► V i€ j c G,* => Hi , 

if and only if there exists a 6 and r] such that for all e < r) 

Kdp D!7 Ai € j v oV- A i€Lt At Bi — ► V,€j c G, Hi . 

For the Dp sentence in the lemma, S is an error propagation factor, and Se is an error propagation function , 
which in this case is linear. The larger the value of the faster error can propagate when this particular 
clause is applied in some chain of reasoning. By comparison, Adams’ notion of entailment corresponds to: 

if and only if for all e there exists a S such that 

| -dp &U A i£i v o Vi A igj A Ai Bi — ► Vi€i c G. =>« Hi . 

The difference between the two notions is that in QDP error is restricted to propagate linearly. 

Likewise, we can convert sentences containing the likelihood, necessity and default operators in a 
clausal form. The corresponding notion of a QDP theorem is given in Lemma 4.5. 

Lemma 4.5 


| =qdp A i£x v A i£j A Ai Bi — > Vi € j c G^ Hi , 

if and only if there exists a 6 and rj such that for all e < 17 

^=j)P OU A ,-^/v A ,'gi^ Ai i Bi — ► V, € / c G, Hi . 

In this case, the error propagation functions, fo” 1 *', are polynomial, and 6 is the error propagation factor. 

Since a smaller likelihood represents more room for error, the smaller the value of <5, the faster error will 

propagate when this particular clause is applied in some chain of reasoning. 

Results below on consistency and consequence of clauses using the default operator are extensions of 
several theorems in [30,19], and similar extensions can be found in [34], although Adams’ terminology 
is not used here. The extensions introduce necessity and possibility. Consistency turns out to be the 
operation on which the three kinds of consequence tests are based. 

Logical tests for consistency and consequence are given in Theorem 4.6 for clauses containing the 
default operator. These are given for DP and, because the error propagation functions are linear, can 
be extended to QDP simply by dropping the error subscripts. While the consistency test in the theorem 
with its “there exists a subset J v looks fairly involved, the kind of trick used in [34, Sect. 4] can be applied 
to develop an algorithm that constructs the subset efficiently. 
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Theorem 4.0 Consider the Dp sentence D given by 

OU A igj v A i£j A A-% > 

where e* < iAr* \ forte Ia- 

1. The sentence D is inconsistent if and only if there exists some j € ly such that U AVj is unsatisfiable 
or there exists some J C I A such that 

U A (Vjg/A,’) A i$j (A» * 

is unsatisfiable . 

2 . The Dp sentence C =>* B is a consequence of D for some 6 < | if and only if D A (C =>* -»B) is 
inconsistent . This holds if and only if D itself is inconsistent or there exists some J C I A such that 

U A(CVi£jAi) Ai e j (Ai — ► Bi) A (£?—►£) 

is unsatisfiable. If C =>s B is a consequence for some 6 and can be demonstrated so using J , then 
6 = m a correct error propagation function. 

8. The Dp sentence oC is a consequence of D if and only if D A □ -»C w inconsistent , 

4. The Dp sentence OC is a consequence of D if and only if D itself is inconsistent or^U—+C. 

When determining the consequences of a consistent set, whether a possibility is a consequence may depend 
on all elements of the set including the defaults, whereas whether a necessity is a consequence depends on 
only the other necessities. In this case, necessity can only follow from other necessities or inconsistency. 
Also note that the theorem shows error propagates additively when reasoning with default rules. This is 
a clear warning against long chains of such reasoning. 

The resultant algorithm for checking consistency of QDP sentences is given in Figure 3. 

Corollary 4*0.1 The defaults- consistency algorithm is correct and uses at most | Iv |+|Ix | 2 /2 satisfiability 
tests on the underlying propositional logic. 

The first step of this algorithm also forms the basis of testing the consistency of sentences containing 
only the necessity and possibility operator. Any such sentence can be converted to a conjunctive normal 
form consisting of a disjunction of conjuncts of the form DEf A^j v Each conjunct can be tested for 
consistency using the first step. 

Corollary 4.0.2 Let the QDP sentence D containing only the necessity and possibility operators be in 
conjunctive normal form , and let |D| denote the number of modal operators in the sentence . Then the 
consistency of D can be determined using less than \D\ satisfiability tests on the underlying propositional 
logic . 

The drawback with this result, however, is that the size of the conjunctive normal form of a sentence can 
be exponential in the size of the original sentence. 

Tests for consistency and consequence using the likelihood operator are given in Theorem 4.7. 

Theorem 4.7 Consider the Dp sentence D given by 

□17 A,£/ v oVi A , Ai Bi , 

where e, < for i E I A - Let J min denote the least subset of I A , I f *uch that U A, 6 j A Aj A Bj is 
satisfiable for all j € La — I- Such a minimum set is unique . 
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1. The sentence D is inconsistent if and only if there exists some j E Iv *uch that U A, € j mim ->A t A Vj 
is unsatisfiable. 

2. The Dp sentence C B is a consequence of D for some f < 1 if and only if D is inconsistent or 

there exists an ordered subset of the indices in I A — I min , »i, • • • > ih, possibly empty (h — 0), such 

that for j = 1, . . h, 

[= U A,* € j mlli A Aij A Bij Ak<j -*Ai k -+ (C A B) , and (5) 

h u fi'ieUi* “A A k<h A ih -► (C-»5) . (6) 

If consequence holds, then a lower bound on f, the error propagation function, is given by 

( e \ h 

f > ( , where e = min e,-. , 

J ~ \1 + ej ’ l <k<h h 

although the error propagation function can be linear in the e,* in some cases. 

8. The Dp sentence oC is a consequence of D if and only if D A D-iC is inconsistent . 

4. The Dp sentence OC is a consequence of D if and only if D is inconsistent or 

t= U A ->Ai -► C . 

There is also a special case of this theorem that applies to non-iterated versions of the likelihood operator. 
Corollary 4.7,1 Consider the QDp sentence D given by 


OU A i£i v oV% A i$j A Ai ztb- Bi . 


The QDp sentence C 2 $- B is a consequence of D if D is inconsistent, or there exists some I C I a — 4n«n 
such that for j E I, 


(= U A A Aj A Bj — ► (C AB) , and (7) 

N u ->Ai A jel-'Aj -* C-*B. (8) 

I conjecture that the converse of this theorem also holds. The dual form of the corollary, converted to 
apply to defaults, allows a disjunction of defaults to be the consequence of a single default. An example 
of this corollary is theorem T17 and its dual. 

An algorithm for checking consistency of QDP sentences is given in Figure 4. Step 2(b) has been 
added to this to make the algorithm more efficient when some of the likelihood operators are proper. 

Corollary 4.7.2 The likelihood- consistency algorithm is correct and uses at most |/y| + \I A \ 2 /2 satisfia- 
bility tests on the underlying propositional logic . 

An algorithm for checking consequence is given in Figure 5. This algorithm assumes the consistency check 
has already been made. The error propagation function in this case can be taken from Theorem 4.7 part 2, 
and a tighter error propagation function is given in the proof of that theorem. 

Corollary 4.7.3 The likelihood- consequence algorithm is correct and uses at most (\I A \ + l) 2 /2 satisfia- 
bility tests on the underlying propositional logic. 

5 Applications 

This sections demonstrates the use of the qualitative logic QDP on three anecdotal problems that reoccur 
in the default reasoning literature. 
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The first example resolves the paradox of the “vanishing subclasses”. The second example demonstrates 
how reasoning about independence using causal networks can be integrated with the forms of plausible 
reasoning just developed. The final example is the classic Yale shooting problem [8]* This example 
highlights a subtle problem with the situation calculus when it is used for plausible reasoning. 


5.1 The “vanishing” emus 

Neufeld et al have criticised the modelling of default reasoning based on infinitesimal probabilities [20, 
pl23] on the grounds that it makes “subclasses vanish”. Consider the following rules: 


The following are consequences. 


Emu — ► Bird , 

(9) 

Emu => -'Flies , 

(10) 

Bird => Flies . 

(11) 

Bird -i Emu 


=> -iEmu 



We can conclude that “typically, birds aren’t emus” and “typically, things aren’t emus”. To show the 
first is a consequence using Theorem 4.6, notice U = (Emu — > Bird) y Iy = 0, and try to show the rules 
together with Bird => Emu is inconsistent. This follows because the rules themselves are consistent and 

U A (Emu V Bird) A (Emu — ► -i 'Flies) A (Bird — ► Flies) A (Bird — ► Emu) 
is unsatisfiable. 

If we take the 0(e) semantics of the default operator literally then we could conclude, since e is 
infinitesimal, that “no birds are emus”, or “nothing is an emu”. The real intent of the semantics, however, 
is about approximations for e small. So instead we should conclude that the emu is just an uncommon 
or non-typical bird, which in reality is true of emus. The approximate probabilistic semantics does not 
cause subclasses to vanish; but it may cause you to deduce some subclasses must be non-typical. 


5.2 Can Joe read and write 

The importance of independence in default reasoning, and plausible reasoning generally, has been under- 
lined by Pearl in his simple problem “can Joe read and write?” [7, Set. 10.3]. This is a good example of 
why general transitivity should not hold for default reasoning. A twist is also given at the end to show 
how likelihood reasoning can complement default reasoning. 

Pearl introduces the propositions (I have altered the symbols) 


Over- 7 
RdWr 
EngPrf 
Shakes 


Joe is over 7 years old , 

Joe can read and write , 

Joe’s father is a Professor of English , 

Joe can recite passages from Shakespeare . 


and the default rules (expressed in QDP) 


RdWr => Over - 7 , 
EngPrf => RdWr , 
Shakes => RdWr . 


( 12 ) 
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Let ^literacy denote ruleset (12). Pearl also assumes that Joe is over 6 years old and is not retarded, so 
that the default rules above seem reasonable. 

Given, in addition, that Joe recites Shakespeare, Pearl argues that a reasonable conclusion is that Joe 
is over seven years old. That is, we want to be able to infer the default rule 

Shakes Over-7 . (13) 

On the other hand, given that Joe‘s father is a Professor of English, it is not a reasonable conclusion that 
Joe is over seven years old. An argument being that Joe’s father’s profession adequately explains Joe’s 
literacy, so we don’t need the more common explanation that Joe is over seven years old. We do not want 
to be able to infer the default rule 

EngPrf Over- 7 . (14) 

The problem with the formulation at present is that the constraints on Shakes and EngPrf are 
syntactically identical, but we hope to infer conflicting default rules for them. In QDP (and in NP) it 
happens that neither default rule (13) nor (14) can be derived. We do get, however, that 

A literacy \=QDP EngPrf => Over-7 < — ► (EngPrf A RdWr) => Over- 7 , 

A literacy f=qx>p Shakes => Over-7 < — ► ( Shakes A RdWr) => Over- 7 . (15) 

The problem as it stands is underconstrained. So, what information is missing? 

Pearl’s solution to the problem introduces the notion of causality. For instance, Joe’s literacy iB a partial 
cause (and the only direct one occurring in the formulation) of Joe being able to recite Shakespeare. What 
Pearl alludes to but never explicitly mentions is the causal network (a Directed- Acyclic Graph (DAG) 
[35]) given in Figure 6. In this network, arcs correspond to the intuitive notion of “can cause”. 

As Pearl and Verma explain, such a causal network provides information about independence [35, 
definition for DAGD, p376]. It should be pointed out that the notion of causality is merely incidental 
to their analysis: it serves as a useful, intuitive focus for acquiring knowledge about independence. We 
ran subsequently apply the dependence information so obtained to default and likelihood reasoning using 
Lemma 4.3. 

Applying Pearl and Verma’s technique of deducing independence relations to Figure 6, we get that 
Joe’s Shakespearean recital is independent of Joe being over seven, given he is literate. In QDP , it follows 
that 

RdWr =$> Over-7 < — ► ( Shakes A RdWr) => Over-7 . 

Let us denote by Ti the dependence information obtainable from Figure 6. Together with the default 
conclusion (15), we get 

^literacy U Ti ^qdp Shakes Over- 7 . 

The same does not hold for EngPrf , however, because in contrast we get that Joe’s father’s profession 
is not independent of Joe being over seven, given Joe is literate. Because of this, the truth or falsehood 
of default rule (14) is undetermined from A uteraey and IV But, if we were also told that it is likely for 
a child of a Professor of English to be under seven years old and literate, then default rule (14) becomes 
false as required. That is 4 . 

^literacy U {EngPrf (RdWr A -^Over-7)} \=qdp -'(EngPrf => Over- 7) . 


4 Derive this result as follows. From EngPrf ft*- (RdWr A ^Over-7), the conditioned version of theorem TlO, and the 
conditioned version of the theorem given in Equation (2), infer - '(EngPrf A RdW r ^ Over- 7). Finally, combine this with 
default rule (15). 
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5.3 The Yale shooting problem 


A second problem that needs to incorporate independence for a solution is the Yale shooting problem [8]. 
This problem has been the subject of considerable discussion in AI, and it is beyond the scope of this 
paper to give a reasonable survey. In this section, the specific solution of Delgrande [36, Section 6.2] is 
considered. In probabilistic reasoning it is important to differentiate between what is currently known, 
and what is not. However, the situation calculus, in which the Yale shooting problem is usually presented, 
allows the representation of knowledge about static properties of a state but represses the representation 
of knowledge about events. This causes problems in the subsequent representation of defaults, which we 
discuss below. 

The Yale shooting problem can be presented briefly as follows: a gun is loaded; one waits for a moment; 
a shot is fired. We should conclude by default that the person is dead, assuming, of course, the gun was 
well aimed at the person, etc. Early default reasoning systems could not make this conclusion; during the 
wait, the gun would not stay loaded by default. 

Delgrande [36, Section 6.2] initially suggested a situation calculus representation of this problem in 
NP that in QDP becomes: 


DT (Alive, So) , 

UT(Loaded , Re8ult(Load, s)) , 

T( Loaded, s) => T(Dead , Resxilt(Shoot, s)) , 

T(f, 8) => T(f, Result(e, s)) . 

Variables are given by e, / and s , and state So is some constant starting state. The first sentence reads 
“Alive is necessarily true in state So”, the third “if Loaded is true in some state s then typically Dead 
will be true in the state resulting from a Shoot in state s”, etc. Assume Re8ult(S n ) is denoted 5„+ 1 * To 
adequately handle the shooting problem we now wish to infer that contingent on a certain sequence of 
events taking place, a death will occur. 

T(Load , So) A T(Wait , Si) A T(Shoot , S 2 ) => T(Dead, S 3 ) . 

As Delgrande points out, this formulation cannot be correct. From the second sentence and theo- 
rem T6' we get 

- 'T(Loaded , s) => T(Loaded , Re8ult(Load, s)) , 
and together with an instance of the fourth sentence (/ = -* Unloaded ), 

-»! T(Loaded , s) => (Loaded, Result(Load, s)) , 


from theorem T7' we get 

DT(Loaded, s) . 

That is, the gun is always loaded! If we added an Unload event to the above formulation that resulted in 
the gun being unloaded, we could similarly deduce that the gun is always unloaded! 

Delgrande suggests repairing this conflicting state of affairs by changing the last sentence to (assuming 
that equality is introduced) 


(/ = Alive) V (T(f, s) =* T(/, Result(e , s))) , 

(e — Shoot) V (T(Alive, s) ^ T(e , Re8ult(Shoot , «))) , 


which together say people do not tend to remain alive if they are shot, or changing the second last to 
( T(Alive , s) A T(Loaded, s)) => T(Dead, Result(Shoot, s)) . 


20 


In addition, we will have to take this kind of evasive action for every event type. Adopting the first 
strategy, the simple concept “things tend to stay the same” is starting to look decidedly lengthy. We are 
required to explicitly detail all those exceptions default reasoning is supposed to circumvent. The second 
strategy seems to introduce an unnecessary complication: if you shoot a dead person they will remain 
dead, so why bother specifying they should be alive before the shooting. 

The real problem lies with the representation of knowledge about events. Without knowing which event 
occurs at a state, we know things will tend to stay the same. Once we know which particular event occurs, 
however, we also know for sure that certain things will change. The antecedents in the conditionals in 
Delgrande’s formulation need to be qualified with knowledge about events to block the conflict between 
the second and fourth sentences. We do this by modifying the sentences to allow explicit representation 
of knowledge about events: 


OT(Alive,So) , 

0(T(Load, s) — ► T (Loaded, Next (s))) , 

T(Loaded , s) A T(Shoot , $) => T{Dead , Nezt(s)) , 

T(f,s)=>T(f,Next(s)) , 

where T(e, s) about an event e such as Shoot denotes that it is known that the event e occurred in 
situation s, and Next(s) denotes the state after state s. Denote this set of sentences by A 9 hooting> 

But with the problem as formulated in A footing, the required result is not forthcoming. Again we need 
information about relevance to show how the redrafted sentences can have their antecedents sufficiently 
specialised. 

First, the following can be inferred from the third rule in A $ hooting given that if a loaded gun is shot 
at someone, then events strictly prior to the shooting are independent of possible death, 

T{Load , So) A T{Wait , Si) A T(Shoot , S 2 ) A T{Loaded, Si) A T(Loaded , S 2 ) => T(Dead, S 3 ) . 

Second, the following can be inferred from the fourth rule in A shooting given that whether a gun stays 
loaded is only dependent on prior Unload or Shoot events. 

T(Load, So) A T(Wait , Si) A T(Shoot , S 2 ) A T{Loaded , Si) => T(Loaded , S 2 ) . 

This information about independence, call it 1*2, is sufficient to yield the required result. 

A footing U r 2 t=Qi>P T{Load , So) A T{Wait , Si) A T(Shoot, S 2 ) => T(Dead , S3) . 

Notice that V 2 could have been obtained automatically using the “default” independent assumptions 
inherent in a maximum entropy approach [7], 

The specification of V 2 can be seen to involve as much detail as Delgrande’s earlier suggestion. So 
where is the advantage? The defaults remain in a simple form, and the exceptions are instead coded in 
the modular form of causal (independence) information about events. 


6 Further Comparisons 

This section compares the logics DP and QDP with some related approaches. Halpern and Rabin’s and 
Halpern and McAllester’s likelihood logics, and Neufeld et al influence graphs are compared because they 
have also been motivated by probability. Comparisons with Adams’ conditional logic have been sprinkled 
throughout Section 4, and are not reiterated here. The last comparison given here is with Delgrande’s 
NP ; this system had an historical influence on the logics DP and QDP. 
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6.1 Halpem and Rabin’s likelihood logic 

Halpern and Rabin propose the unary likelihood operator L with semantics [14, p386] 


Lp is best thought of as saying “p is reasonably likely to be a consistent hypothesis.” 

This should not be confused with “p is reasonably likely”, the interpretation Halpern and Me Allester give 
to Lp [9, p5]. 

For instance, suppose a lottery with 1, 000, 000 tickets is being held, then the following can be deduced 
by applying their Axiom AX 6 repeatedly: 

1,000,000 

X (someone will win the lottery) < — ► \J ^(person i will win the lottery) . (16) 

i=X 

The right hand side of this equivalence reads, there exists a particular person who is likely to win the 
lottery. In the Oxford dictionary sense of the word “likely”, this is certainly not true before the lottery 
is held. So in the Halpern-McAllester interpretation, the sentence (16) above can be interpreted as 
f ru€ <_► false . This is a variant of the lottery “paradox”. Because they assume that likelihood reasoning 
is precise, they conclude that the Halpern- Rabin interpretation must be more appropriate. 

By contrast, in the framework proposed here it is taken for granted that likelihood reasoning may be 
imprecise. As explained after Definition 4.5, QDP suffers from the lottery “paradox” in a sense, but it is 
viewed as an anomaly, an inherent consequence of modelling imprecise reasoning with a precise logic. Of 
course, such anomalies can be avoided by either using heuristics about plausible reasoning (“don’t make 
too many ass umptions”), or by resorting to numeric methods which allow more careful tallying of degrees 
of imprecision. 

Notice that interpreting Lp to mean “pis a consistent hypothesis” yields the following transformation to 
QPD: Lp*-> op, and Gp Op. Indeed, their axioms on non-iterated modalities each have a corresponding 
theorem in QPD . 

Halpern and Rabin propose instead that iterated modalities of the form L*Gp be used to model p is 
reasonably likely”, and they give soundness results to support their claim. There is, however, a serious 
methodological problem with this approach: knowledge expressed in the form they propose is non-modular 
and cumbersome. A sentence such as “Pi is reasonably likely given P 3 ” is represented in QDP simply as 
P 3 ^ Pi. In their logic it translates to 

— i£t~>Pi A ~~*GP2 A ■’GPi A GP 3 ^ LGP\ 


in one situation, and 

“i GP\ A -»G-»P2 A “iGPi A GP 3 ^ LGPi 

in another. These cumbersome translations occur because, as they explain [9, p7], their representation has 
no means of making likelihood contingent on what is currently known (for instance, by using conditioning, 
the role played by the left hand side of the “p$~” operator). Worse still, if the model (and consequently 
the atomic propositions used) becomes extended, the appropriate translation must be extended as well. 
In addition, Halpern and Rabin give no evidence that non-trivial theorems hold about iterated modalities 
of the form X*G. 

6.2 Neufeld and Poole’s favouring formalism 

Neufeld ei al present influence graphs , a qualitative system for reasoning about favouring [20] that is 
related to Suppes’ causal algebra [37], B favours A when Pr(A\B) > Pr(A). It was argued in Section 4.5 
that favouring provides an important complement to the logics presented here. 
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Favouring alone , however , is not sufficient information on which to base a decision. This stems from 
the fact that favouring is for reasoning about shift in belief and not current strength in belief. For instance, 
it is well known now that smoking favours cancer (that is, a smoker is more likely to have cancer than 
a non-smoker). But the knowledge that a person smokes is not sufficient evidence on which to base a 
conclusion that the person has cancer. It merely provides an additional degree of support for such a 
conclusion. 

6*3 Delgrande’s conditional logic NP 

There is a strong correspondence between the theorems of QDP and Delgrande’s NP [1]. The only axiom 
of NP that is not also a theorem of QDP is the CV axiom given by 

->(A => B) — ► (A ^ C) — * (A A -i B) => C , 

although this is similar to the QDP theorem T14'. Notice, however, that, by adapting T14' we get that 

\=dp =>es/2 B) —♦ (-A =>« C) — ► (A A ~^B) =>s C . 

This version of the CV axiom does not also become a theorem in QDP because the first default has an 
error that is a different order of magnitude to the second two defaults (e£ compared with e and £). 

Also, necessity is introduced into QDP and in a very different manner. Nevertheless, theorems 
involving necessity in NP given in [1] are also theorems for QDP . Consequently, almost every theorem 
of NP that is a sentence of QDp is also a theorem of QDP . 

7 Conclusion 

This paper has examined the problem of reasoning about defaults and likelihood from a probabilistic 
perspective. The presentation has been one of theoretical analysis, comparison with existing systems, 
and review of anecdotal examples. The approach developed has extended some existing Bystems [19,7, 
1] and put some others in a clearer perspective [14]. This highlighted the approximate nature of the 
reasoning forms, the duality between them, and the need for complementary reasoning about relevance 
and error propagation. Algorithms have also been presented for determining some types of consistency 
and consequence for both logics, qualitative and quantitative. 

The following research issues give some idea of how this area might be further developed. 

Causality, independence [7] and favouring [20] play a complementary but vital role to default and 
likelihood reasoning. They help in the determination of relevance, for the derivation of plausible rules 
applicable to a system’s current context. Suppose we have separate information about relevance and de- 
faults. How might reasoning about both these forms be integrated? For instance, how can the consistency 
and consequence algorithms be interfaced with algorithms for reasoning about independence? 

There is a remarkable similarity between Delgrande’s conditional logic NP and the probabilistic logics 
presented here. With the necessity and possibility operators, the logics presented here have an ability to 
express sentences roughly in the realm of autoepistemic or default logics. What are the relationships to 
these other approaches? 

How should the effect of the decision context be integrated? For instance, one would like to be able 
to obtain the default reasoning structure present in the layered control systems of Brooksian robots [38] , 
where each layer is intended to handle a different class of decision problems. How might these layered 
systems be developed? 

Given that defaults and likelihoods have been represented here as probabilistic rules, how might they 
be learned from data? Machine learning techniques for rule induction have been developed, but these only 
allowing one particular propositional symbol (or concept) in the consequence of the rule. Some methods 
are described in [39,40,25]. To learn a set of defaults and likelihoods, more general approaches are required 
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that simultaneously learn rules with a variety of different propositional symbols in the consequence, as 
found in [41]. 

At what point does qualitative reasoning of QDP have to be augmented with quantitative reasoning of 
DP to produce reliable results? Furthermore, when do the approximations inherent in DP break down so 
that a system needs to be developed using more thorough probabilistic reasoning? It may be necessary to 
reason about uncertainty using approximate numeric techniques, and to use the plausible logics developed 
here merely at the man-machine interface. For instance, one observable use of default and likelihood 
reasoning in people is explanation and presentation of results. 

Implementation and application to real problems is clearly one important way to explore these plausible 
reasoning forms further. 
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Appendix Proofs of Lemmas and Theorems 


Proof of Theorem 4.1 First, substitution instances of propositional logic hold for DP because the 
interpretation of theorems for DP is given in terms of propositional logic (“not”, “and”, etc.). Suppose 
E G QDp is some substitution instance of a theorem of propositioned logic. Consider E f G Dp obtained by 
transforming => to => e and zy n to for some e and e. E ' is also a substitution instance of propositional 
logic, so it is a theorem of DP . As this holds for arbitrary € and e, E is a theorem of QDP . 

Second, equivalences can be substituted. Suppose Pr(A *-* B) = 1, then 

Pr(C(A)) = Pr(C(A) A (A «-> B)) = Pr{C(B) A (A ~ £)) = Pr(C(B)), 

and the result follows from the definition of a theorem in DP. A similar proof applies for QDP . □ 

Proof of Lemma 4.2 Any sentence that holds for an arbitrary probability distribution must hold for 
an arbitrary probability distribution conditioned on some C given that Pr(C ) > 0. That is, given that 
Pr(C) > 0, we can make the transformations 

Pr(A) h- Pr(A\C) 

Pr(A\B ) Pr(A\B A C) 

and the sentence must still hold for any arbitrary probability distribution. This corresponds to the 
transformations given in the lemma. Notice there is no confusion in applying the transformations because 
the operators do not nest. □ 

Proof of Lemma 4.4 First, notice that the order of magnitude definition of Definition 4.5 applies if and 
only if there exists constants c, for i G I a and d, for i € Ic and V such that for all e < rj and probability 
distributions Pr, 


^Pr A|g/ V oVf A igJx A% P» * Vtgj c G’| P* • (^*0 

To show the only if part of the theorem, assume the £, € condition in the lemma holds. Then let c, = 1 
for t G Ia and d, = S for * G Ic » so by above, the clause is a theorem of QDP. 

To show the if part of the theorem, assume the clause is a theorem of QDP so constants c, for t G Ia 
and di for i G Ic and 77 exist as above. Let a = min, € / A c f *, and S = max l€ / c dj/a, and 17 ' = 77 a. Pick any 
e f < if and note e = e f /cx < 77 . We now have that 1 — Cj£ < 1 — f 7 , and 1 — d%€ > 1 — * Se r . So if a probability 
distribution Pr satisfies the clause (17) using 77 and e, then it also satisfies the Dp clause in the theorem 
using rf and c'. So this clause is satisfied for every distribution. O 


Proof of Lemma 4.5 The proof proceeds as for Lemma 4.4 but using 

f=p r DU A i£l v o Vi A i£l A Ai Pi * V,*€/ c Gi P» > 

instead of formula 17. There is a difference in showing the if part of the theorem; 6 is now constructed in 
an inverse manner. Let 

a — maxi e j c ^i/cl and 6 = mirii^i c , 

and proceed as before, noticing that we are dealing with quantities such as 6(e , ) ni rather than 1 - Se*. □ 
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Proof of Theorem 4.6 part 1 First, assume D is consistent and we shall prove both unsatisfiability 
conditions fail. With D consistent, Pr(U A Vj) > 0, so U A Vj is satisfiable, and the first unsatisfiability 
condition fails. To show the second unsatisfiability condition fails for each J, it is sufficient to show it fails 
for J = Ia, because if D is consistent, then any subset of D is consistent, so, correspondingly, the same 
reasoning applies for any subset J of I a- We shall show this failure by contradiction. Assume the second 
unsatisfiability condition holds for J = I a, then for any probability distribution such that Pr({7) = 1, 
Pr(Vi € jA, A ->Bi) = Pr(VjgjAi). Therefore, 


maxi£j Pr(Ai A ~<Bi) > 


ITie/ Pr i A * A "'Bi) ^ Pr(ViejAi A ->!?,•) ^ Pr(V, e /Aj) 

\J\ ~ \J\ ~ \J\ 


> 


\J\ 


for any j. Therefore either Pr(Aj) = 0 for all j € I a, or for at least one j € I a, Pr(->Bj|A,-) ^ w This 
gives the contradiction since €j < tjt. 

Second, assume both unsatisfiability conditions fail and we shall prove D is consistent. Notice because 
the second unsatisfiability condition fails for every / C I A , there exists an ordered subset of I A given by 
iiy • • • f (where m = \I A |), such that U A Ai s Ajt >j (Ai fc — ► Bi k ) is satisfiable for j — 1, . • m. Let truth 
assignment tj demonstrate this satisfiability for j = Also let truth assignment t ^ demonstrate 

the satisfiability of U A Vj for j £ Iv- These second assignments exist because the first unsatisfiability 
condition fails. Now define the probability distribution Pr as 


Pr(C) 


e (i-omc) n 

fc — 1|*« »|fn ^*l>***t^ “ ^ 




\Iv\ 


E tyw • 

jf€lv 


where the truth assignment t(C) takes the value 1 if C is satisfied by t, and 0 otherwise. By construction, 
this is a well-defined probability distribution that satisfies all the right inequalities for arbitrary e, < 1. 
□ 


Proof of Theorem 4.6 part 2 To show the only if part of the theorem, assume D A (C =>e ~^B) is 
consistent. IfC^ -2? with 6 < \ then clearly ->(C =>* B), so D A -(C B) is consistent, so it must 
be false that ^=dp D — ► (C B). 

To show the if part of the theorem, assume D A (C —B) is inconsistent. If D is inconsistent, then 
clearly D —* (C B). If D is consistent, by part 1 of the theorem and the consistency assumptions 
just made, it must follow that 

f= -(If A (ViejAi V C) A (Ai — Bi ) A (C — ► -B)) , 

for some J C I A . Prom the second half of the proof for Theorem 4.6 part 1, this follows for any S < 1, 
not just 6 < rjtr. Noting that (V ieJ Ai V C) is equivalent to (V ie jAi V C A < € / -A*) and taking this 
disjunction out tnrough the negation, it follows that 

|= U A (Vi € jA 0 A,* € j (Ai — ► Bi) ^CAB , 
f= U A C A — B — ► V,g/AjA— Bi . 

Notice that if ^ P — ► P then Pr(F) > Pr(E) for any probability distribution Pr. So for any distribution 
Pr such that Pr(l7) = 1, 

Pr(C A B) > Pr((V i€J Ai) A w (At - Bi)) = Pr(V,- € jA<) - Pr(V i€ jAi A-P.) , 
and Pr(Vj € /j4< A->S,) > Pr(CA~<B). Let Pr be any probability distribution satisfying OU A i€ i v oViA ie i A 
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{A, => tl Bi). So Pr(Aj A ->Bj) < (jPr(Aj) < e,Pr(V i€ /.A,) for any j even if Pr(Aj) = 0. Consequently, 
Pr(\/i£jAi A ->J3j) < ^2 Pr(Ai A ~<Bi) < J Pr(VjgjAj ) . 

iej \ieJ ) 

Let S = (Ei € /«■)• So 

Pr(C A B) > (i-l)Pr(V i€ jiljA-A) > i^Pr(CA-B), 

0 c 

which is the required inequality to show C =>s B. □ 

Proof of Theorem 4*6 parts 3 and 4 The sentence oC is a consequence of D if and only if D A -> o C 
is inconsistent, by definition of consequence and inconsistency. Replacing -* o C by CH C shows part 3 
holds. Similarly, part 4 holds but this time we can simplify the inconsistency of D A o-»C by using part 1 
of the theorem. □ 

Proof of Corollary 4.6.1 We shall show that the algorithm reaches to the end if and only if the 
sentence is consistent. By the theorem, it is sufficient to show it reaches to the end if and only if for every 
j £ Ivy U A Vj is satisfiable and for every J C I,*, TJ A (V^/A,*) A * € j ( Ai — ► Bi) is satisfiable. 

Clearly Step 1 handles the first case correctly. 

Consider step 3(a). Notice that if for j £ J, £TAAj A,^ j(A, — > Bi) is satisfiable, then U A(Vi^ j*A,*)A f *€ j> 
(A f - — ^ Bi) is satisfiable for every J’ C J containing j. So we now only need to consider subsets not 
containing j. The repeat loop in Step 3 simply uses this fact iteratively to eliminate each possible index 
j from the original set J. So the loop terminates short if and only if this satisfiability fails, which means 
the original sentence was inconsistent. □ 

Proof of Theorem 4.7 part 1 First prove I min exists and has a unique minimum. Notice I A is an 
upperbound on Iminy 80 some (but not necessarily unique) J m in exists. Suppose a set I f exists which is a 
subset of every possible /min? that U A Aj A Bj is unsatisfiable for some j £ I A — V • Then 

this unsatisfiability will also hold for any 7 m j n , so j must also be in / m , n . So we can place j in I f too. If 
we start with F = 0 and iterate this operation to a fixed point, we clearly obtain the unique F = J m i„ 
because an invariant of the operation is “any / mln must be a superset of F v . 

Suppose the unsatisfiability condition fails, that is, for each j £ ly that U A,g/ mfn -’Aj A Vj is satisfiable. 
So there exist truth assignments demonstrating the satisfiable of these. There also exist truth assignments 
satisfying TJ A ,* € j m<w -A, A Aj A Bj for j £ I A - 7 mi „, by the definition of 7 min . Take a probability 
distribution that makes each assignment in the first set infinitesimally small, each assignment in the second 
set equiprobable, and any other truth assignments probability zero. So Pr(Vj) > 0, and for j £ I A - 7 min , 
Pr(Bj\Aj) is greater than or arbitrarily close to f , etc - This distribution demonstrates D is 

consistent. 

Suppose D is consistent. Consider any probability distribution Pr that demonstrates this. Let 7 = 
{i e Ia : Pr(Ai) = 0}. So Pr(U A l€ j ^Ai) = 1. Since Pr(I^) > 0 for each j £ V>, it follows that 
Pr(U A i£i -*Ai A Vj) > 0 as well, so the corresponding propositional sentence must be consistent. Also, 
Pr(Aj) > 0 for j £ I A — 7, so since D is consistent Pr(Aj A Bj) > 0 and Pr(U A ->Aj A Aj A Bj) > 0, 
so the corresponding propositional sentence must be consistent. A side effect of this is that 7 m » n Q Iy 
therefore the above satisfiability conditions holding for 7 also hold for 7 m ,* n , as required for the theorem. 
□ 

Proof of Theorem 4.7 part 2 First prove the only if part of the theorem. So assume C B is a 
consequence of D for some 6. It is sufficient to prove that if D is consistent and U A, € 7 min -»A, A C A ~^B 
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is satisfiable, then there exists an ordered subset of the indices in I a ^ Imim •••!** such 

formulas (5) and (6) are true. Do this by contradiction. Suppose there does not exist such an ordered 
set of indices. Then there exists an ordered subset of the indices in 7 a — I min, 7 = *2 i • • such that 

formulas (5) are true, but formula (6) fails and there does not exist an index ih+i such that formula (5) 
applies for that index. Note that this occurs only if 

U A A Aj A Bj Aie/ ~^Ai A -«(C A B ) 

is satisfiable for every j € Ia — 7 min - 7, and 

U A iei^in Ai € / -*Ai A C A -»B 

is satisfiable. Call the set of |7a — I m in - 7| truth assignments satisfying the first form above Ti, and the 
truth assignment satisfying the second form TV By the definition of / m < n , we have that U A A 
Aj A Bj is satisfiable for j € 7. Call the set of |7| truth assignments satisfying this T 3 . Finally, since D 
is consistent, we also have that U A i£i m{n “ | A, A Vj is satisfiable for j £ Iv • Call the set of |7v| truth 
assignments satisfying this T 4 . Now for r) vanishingly small, consider the probability distribution Pr that 
makes truth assignments in T 4 have probability those in T 3 have probability , the one in T2 

have probability i)( 1 — q), those in T* have probability > and any other truth assignment have 

probability 0. This makes Pr(Bj\Aj) > ^ ^ or J € 7 a “7 mfn — 7, Pr(-»B[C) > 1 — J?, etc. Clearly, 
Pr with a suitable value of 77 can be used to demonstrate D A ^(C £$-/ B) is consistent for any / < 1. So 
we have proven the contradiction. 

Next prove the if part of the theorem. Clearly, consequence holds if D is inconsistent. So assume it 
is consistent, and assume without loss of generality that ij — j for notational convenience. Consider any 
probability distribution Pr such that 

^•Pr A t£/ v oVi A|g/^ Ai £*he{ B* • 


So for each j € 7a — 7fittn, 

Pr(Aj A Bj) > -Ji-P r (Aj A -By) , (18) 

i - el- 
even if Pr(Aj) = 0. From part 1 of the theorem we also know that Pr(l7 A »€/*»<* "^i) = 80 this 

term can be effectively ignored in probability statements that follow. If h = 0, then from formula (6) it 
follows that Pr(C — * B)=l, which implies Pr(B | C) = 1, so the if part holds. Otherwise, h > 1. From 
formulas (5) and (6), we have that 

N U A *<h (Ak — ► Pit) — ► {C — ► B) , 

so 

^ Pr(A* A -»B*) > Pr(Vjt<fcj4* A “»Pfc) > Pr(CA-iB). (19) 

Also, there must exist sets Pj C {l t . . .,j — 1} such that formulas (5) can be replaced by 

(= U A iei min "'Ai A Aj A Bj AkePj —►(CAP), 


for j = 1, . . . , h. Then 


^Pr(A k ) > Pr(Vfc € p i Ajb) > Pr{Aj A Bj A -»(C A P)) . (20) 

*€Pi 

These inequalities are strung together below to produce the desired result. 


30 



For j E 1,..., h, define 


3j = min e * , 

J k£Pj 


7 i = 

We shall prove by induction on j that 

h h 


1 — e 


-Ji + y' I* 


ACAP) + £ f ]T Pr(A,AP,) > ^Pr(A t A-P t ). (21) 

* = ?’ *=j * = 7 

Assume it is true for j + 1. Consider the case for j . Notice that by formula (18) 


i-^Pr(ji ; A Bj) + ££ Pr(A, A P,) 


/3, ' 


> Pr^A^O + ^^e.Pr^,) 


l€Pj 


> Pr(Aj A -, P ; ) + 7j Pr(Aj) . 

'ZPi 

Adding this to the inequality for the induction hypothesis, formula (21) with j + 1, we get that 


7 >Pr(Aj A Bj) + J2 nPr(A k AB t ACAB) + ^J- £ Pr(A, A P,) 

*=i+l *=> *** I : l€P fc) /<j 

A 

> £ Pr(A k A -P fc ) + yj Y Pr(A,) . 

k=j l£Pj 


So 


A A 

^2y k Pr(A k AB k ACAB) + J2¥ J2 *H*APi) 

k=j k=j P* t : 1tP kf l<j 

> Y, Pr ( A * A -Pk) + 7j ( Y, Pr ( A >) - Pr ( A i A B i A _, ( C ' A p )) 

*=; \J€Pi 

By formula (20), the induction step is proven. Notice that this same argument works for the base case of 
the induction proof, if we start at j = h using 0 > 0, so the induction proof is complete. 

Finally, for j — 1 in formula (21), we have that 

A A 

y^/y k Pr(A k A B k AC A B) > ^Pr(j4* A -*B k ) . 

Jk=i fc=l 



By formula (19), it follows that 


So 


Pr{B\C) > 


h-j 


1 + E*=i7* 

The right-hand side of this inequality gives the error propagation function / for this consequence. This 
can be evaluated using the definitions of 7 j , 0j and Pj given previously. Clearly, the smallest this error 
propagation function can be is when Pj = {1, . . j — 1} and f3j — e\ = e for j < h. In this case, a simple 
induction proof shows that 

* = 

Summing these over j and simplifying gives 

/ > 

e + (l-e)(l + i) 

In contrast, if Pj = 0, and Pj = e, then / > y. O 


IT > 




Proof of Theorem 4.7 parts 3 and 4 The same as for Theorem 4.6 pauts 3 and 4. Notice, also, that 
l m i n reinadns the same if a possibility is added to D. □ 

Proof of Corollary 4.7.1 The if part of the corollary follows from Theorem 4.7 part 2. Notice that 
if Pj = 0 for each j, then the error propagation function developed in the proof of Theorem 4.7 part 2 
becomes 


1 + ELl 


i-ck 


This behaves linearly for small e*. If any Pj ^ 0, however, this linear behaviour no longer exists. □ 


Proof of Corollary 4.7.2 The repeat loop in Step 2 simply performs the construction described in the 
proof of Theorem 4.7 part 1. This iteratively builds up 7 m)n . The repeat loop terminates when for all 
j £ I A — I, U A,- € j -i Ai A Aj A Bj is satisfiable. Otherwise, the algorithm is a direct implementation of 
Theorem 4.7 part 1. □ 

Proof of Corollary 4.7.3 The algorithm builds the ordered set of indices in turn. Clearly, if it fails at 
step 3(c), then no such ordered set can exist. □ 
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Table 1: Theorem schemata (and duals) on “□* and “o” 


kd 

OA «-+ i=>o A 

o A F^o A 

□(i4-»B) -4 (* t A - t>'B) 

□(A -» B) — (p*-* A -► Fh e B) 

T7 

A A (=>•« ->^4) 

(F^e A V p>- e ~v4) 

T8 

oA A ( A =$- t B) — ► oB 

(oj4 A DB) — * A a>- e B 


Table 2: Theorem schemata (and duals) relating and u £ib v to and “o” 


T9 

A=> e A 

A -A 

T10 

A => t B - (j4 — » B) 

p*- e {A A B) — * .4 s*- e B 

Til 

(p-< A A ^B) — (=><+« {A A B) 

F»- e +d (.A V B) -* (p^ e -4 V pj-d B) 

T12 

^ * (^ ^ B 

F=>< -4 — » (F*-<t B —* A *>-d- e B ) 

T13 

-<4 — ♦ (ps^ B ► A ^j+4 B) 

F*. A — (j4 sss-rf B — B) 

T14 

FS>- e -4 -» (l=>« B — =>;u B) 

FJ-e ^4 — » {A s$- d B — * B) 

T15 

(A =>« C) A (B =>« C) -» (A V B) =>«+* C 

(.4 V B) ssj-e+j C — * (.A s*- e C V B aji-d C) 

T16 

{A A B) — * (->j4 =>e ->B — * B => is j4) 

F$- e (j4AB) — » (B ->A — ♦ B) 

T17 

{AW B => T C) -> (-4 =►, C V B =>< C) 

(.A C) A (B C) — * j4 V B sjk- C 


Table 3: Theorem schemata (and duals) on M =>” and 
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T6' 

□(C A A — ► B) - {C=> t 

A -» C=> ; 

5) 

T7' 

(C A) A (C => e -«A) — ► -ioC 


Til' 

(C => ( A) A (C =>( B) 

C =>«+« (A 

AB) 

T12' 

C =>< A - 

— * O' =>,+« 


T13' 

C =>* A — ► (C =>6 B — ► 

C A A =£>+* 


T14' 

C R*- e A - (C =>> B — 

C A A =>2i 


S) 


Table 4: Conditioned theorem schemata 


Input: A QDp sentence 

□J7 A ,* € / v o Vi A i$i A Ai =$► Bi . 

Output: The consistency or inconsistency of the sentence. 

Algorithm: Check the two sets of satisfiability conditions in turn. 

1. If for some j £ IV, U A Vj is unsatisfiable, return inconsistent* 

2. Let J = Ja- 

3. Repeat, 

(a) Find a j E J such that U A Aj A ,- € j (Ai — *• £,) is satisfiable. 

(b) If no such j can be found, return inconsistent 

(c) Else, J = J - {j}. 

Until J = 0. 

4. Return conawfeni. 


Figure 3: The defaults-consistency algorithm 
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Input: A QDp sentence 

A,*gj v oVi A *g/_4 A, 1 B% • 

Output: The consistency or inconsistency of the sentence. 

Algorithm: First construct 7 m i„, then check the possibility conditions. 

1. Let I = 0. 

2. Repeat, 

(a) Find some j £ I A - 7 such that U A « 6 j -»A* A Aj A Bj is unsatisfiable. 

(b) If some j found and oAj is in the possibilities in the input sentence, then 
return inconsistent 

(c) Else, if some j found, 7 = 7 U {j}. 

Until no j found. 

3. 7 is now equal to 7 m i«. If for some j £ Iy,U A^j-iA* AVj is unsatisfiable, return 
inconsistent . 

4. Else return consistent 


Figure 4: The likelihood-consistency algorithm 
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Input: A likelihood C D, a consistent QDp sentence 

UU A , € / v o Vi A i e i A Ai a*-" 4 Bi , 

and its index set 7 min . 

Output: Whether the likelihood is a consequence of the sentence for some value of m. 

Algorithm: Build up the ordered subset of I a iteratively. 

1. If U A “'•A, A C A is unsat isfiable, return is a consequence for any m. 

2. Set 7 = 0. 

3. Repeat, 

(a) Find some j € I a - imin - J such that 

(= 17 Ai € / m<w ->Ai AAjABj A,*€i (Af — ► Bi) —►CAB. 

(b) If some j found, 7 = 7 U {j}. 

(c) Else return not a consequence . 

Until U Ai 6 / m<n ->A, A C A -»B A; e j A, is unsatisfiable or 7 = /,* - 7 m , n . 

4. If the loop terminated only because 7 = 7* — 7 m «n, return not a consequence. 

5. Else return is a consequence for some m. 


Figure 5: The likelihood-consequence algorithm 


( Over-7 ) 


( EngPrf 


RdWr )— (Shakes) 


Figure 6: Dependency network for “can Joe read and write?” 
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